Chapter 21
Summarizing and Graphing Survival Data
IN THIS CHAPTER
Beginning with the basics of survival data
Generating life tables and trying the Kaplan-Meier method
Applying some handy guidelines for survival analysis
Using survival data for even more calculations
This chapter describes statistical techniques that deal with a special kind of numerical data called
survival data or time-to-event data. These data reflect the interval from a particular starting point in
time, such the date a patient receives a certain diagnosis or undergoes a certain procedure, to the first
or only occurrence of a particular kind of event that represents an endpoint. Because these techniques
are often applied to situations where the endpoint event is death, we usually call the use of these
techniques survival analysis, even when the endpoint is something less drastic (or final) than death.
Survival data could include time from resolution of a chronic illness symptom to its relapse, but it can
also be a desirable endpoint, such as time to remission of cancer, or time to recovery from an acute
condition. Throughout this chapter, we use terms and examples that imply that the endpoint is death,
such as saying survival time instead of time to event. However, everything we say also applies to
other kinds of endpoints.
You may wonder why you need a special kind of analysis for survival data in the first place. Why not
just treat survival times as ordinary numerical variables? Why not summarize them as means, medians,
standard deviations, and so on, and graph them as histograms and box-and-whiskers charts? Why not
compare survival times between groups with t tests and ANOVAs? Why not use ordinary least-squares
regression to explore how various factors influence survival time?
In this chapter, we explain how survival data aren’t like ordinary numerical data and why you need to
use specific techniques to analyze them properly. We describe two ways to construct survival curves:
the life-table and the Kaplan-Meier methods. We guide you in preparing and interpreting survival
curves and show you how to glean useful information from these curves, such as median survival time
and five-year survival rates.
Understanding the Basics of Survival Data
To understand survival analysis, you first have to understand survival data. Survival times are
intervals between a designated starting time point and the time point an event occurs. These intervals
have can have a specific type of missing data due to a phenomenon called censoring. Because survival
data usually include censored data, they must be analyzed in a very specific way to avoid generating
biased estimates that lead to incorrect conclusions.
Examining how survival times are intervals